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METHOD AND SYSTEM FOR EFFICIENTLY 
RETRIEVING INFORMATION FROM A DATABASE 



5 Background Of The Invention 

Field Of The Invention 
The invention relates to a system and method for locating and retrieving 
information from very large databases. Such a system and method are particularly useful, 
for example, with electronic mail systems, which require fast retrieval of user directory 
10 information for routing large volumes of messages. 

Discussion Of The Prior Art 
Management of information has become critical to modern civilization. 
Information that is collected together in an organized fashion is referred to as a database. 
A database file conventionally consists of a group of records, with each record then being 
15 subdivided into one or more fields. For example, a database for routing e-mail might 
contain the database records: 

jsmith@company.com|Jane|Smith|password|/usr/js/mailbox 
jjones@company.com|John|Jones|secret|/usr/jj/mailbox 
In this example, each database record consists of five fields: an e-mail address field, a 
20 first name field, a last name field, a mail password field, and mailbox location field. The 
record terminator is a new line character and the field terminator is the | character. Each 
record may have a different location in memory. The record offset (i.e., the position of 
the record in the database memory relative to a reference point) for the record containing 
the term "jsmith@company.com 51 may be 0, for example, while the record offset for the 
25 record containing the term "jjones@company.com" may be 54. 

The information within one or more of the fields for a record may be used to 
locate and retrieve the record from the database. The field information used to retrieve a 
record is commonly called the key, and the field in which this key information is stored is 
called the key field. In the e-mail routing database described above, for example, the key 
30 might be the user's e-mail address. Thus, when someone wanted to retrieve a user's 

mailbox location, he or she could employ the user's e-mail address to locate and retrieve 



1 



the database record containing the same e-mail address (i.e., a matching key) in its key 
field. Preferably, a database is optimized so that record retrieval based on each record's 
key is fast and efficient 

One method of locating and retrieving a record from a database is to sequentially 
5 access and search each record's key field until a matching key is located. This method of 
record retrieval is referred to as a linear search. However, as the number of records in a 
database increases, it is neither fast nor efficient to sequentially examine each record to 
find the one with a matching key. To improve record retrieval spded in a larger database, 
an index table is often built for the database. 

10 The use of index tables to improve database record retrieval speed is well known 

in the art. One method of employing index tables is the indirect accessing method. With 
the indirect accessing method, only a pointer list is accessed directly. Each pointer in the 
list identifies the location of a record in memory, and the pointer's position in the list is 
defined by that record's key. Thus, a key can be used to quickly obtain the pointer, and 

1 5 thus the address, for the record with the matching key. 

According to this method, the index table can directly index each record's pointer 
by that record's key. If the key information has a large number of possible values, 
however, then the index table will require a correspondingly large amount of memory. 
For this reason, index tables typically use a key-to-address transformation algorithm to 

20 index the pointers. That is, the pointer for each record is indexed by a mathematical 
transformation of the record's key, rather than by the key itself. 

The key-to-address transformation is often performed using a hash (or hashing) 
function. A hash function is any process that maps data to a numerical value. For 
example, one hash function may convert the characters of a key into their ASCII value, 

25 add the ASCII values, and then divide the added ASCII values by a prime number to 

produce a remainder as the hash value. Because the hash function can be selected to limit 
the maximum possible hash value of a key, indexing the records against hash values 
reduces the amount of memory required for the index table. 

The use of a hash function presents an additional problem, however. A hash 

30 function may generate the same hash value for two different keys from two different 
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records. This is referred to as a collision. When this occurs, the hash value cannot be 
used to uniquely identify the location of the record in the database. 

Several methods for resolving such collisions are described in the prior art. The 
separate chaining method creates a linked list of records whose keys have the same hash 
5 value. Once a hash value is obtained from a key during a search, each linked record for 
that hash value can be reviewed until a matching key is found. With the linear probing 
method, the hash value identifies a specific location in the index table. If this location 
does not contain a matching key (or an address for a record with a matching key), each 
subsequent memory location is probed until a matching key (or an address for a record 

10 with a matching key) is found. 

The double hashing method extends the linear probing method to avoid the 
problem of clustering that can make linear probing slow for tables that are nearly full. 
The double hashing method uses two different hash functions. The first hash function 
identifies a specific location in the index memory, and the second hash function identifies 

15 a further address offset from that initial location. 

As the number of records in a database increases, however, these methods for 
collision-resolution become less efficient. The number of collisions increases with the 
size of the database, causing the amount of memory required to implement the collision- 
resolution methods to increase as well. Also, collision-resolution requires access to the 

20 record's key for comparison with the search key. In the case of indirect access, retrieval 
of the actual record for key comparison degrades the performance. While the key may be 
stored in the index table for ready comparison when collisions occur, this alternative 
significantly increases the size of the index table. 

Further, the hashing function itself becomes more difficult to implement as the 

25 number of records in a database increases. For the open addressing methods, such as 
linear probing, the index table size must be greater than the number of records in the 
database. For the commonly used "remainder of division" hash function, the size of the 
hash table should be prime, and computing the hash value for long keys can be expensive 
in terms of processing time. 

30 
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Summary Of Invention 

The invention provides a method and system for creating and using database 
index tables that may offer many advantages. For example, index tables according to the 
invention may be used to index even very large databases with only a small number of 
5 collisions between hash values. Further, the index tables of the invention use hash values 
that can be quickly and efficiently processed. Moreover, the space required to store the 
index table is very small. 

According to one embodiment of the invention, two different cyclical redundancy 
check (CRC) values are obtained for each record key to be indexed. The first CRC value 

10 defines the portion of the index table in which the record's address is stored. The second 
CRC value is then stored with the record's address in the table. To then retrieve a record 
based upon a search key, the key's first CRC value narrows the portion of the index table 
to be searched. Once a specific portion of the index table is identified with the first CRC 
value, that portion of table can be sequentially searched until the second CRC value for 

1 5 the search key is located. The second CRC value then identifies the memory address of 
the record containing a matching record key. Because CRC values are collision resistant, 
the chance of collision within an index table using at least two types of CRC values is 
very low. Further, because CRC values can be quickly calculated using tables, the 
processing time for calculating an index value is reduced. Also, in addition to using CRC 

20 values obtained from a record's key to index that record's address, various embodiments 
of the invention also employ CRC values from a record's key to position that record 
among a plurality of different storage media. 



25 



Description Of The Drawings 

Fig. 1 illustrates a database index table according to one embodiment of the 



invention. 



Figs, 



2A and 2B illustrate one method of generating the database index table of 



Fig. 1. 



Figs, 



3A and 3B illustrates a method of searching the database index table shown 



30 in Fig. 1. 
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Fig. 4 illustrates a plurality of different storage media for storing a database 
according to the invention. 

Detailed Description Of Preferred Embodiments Of The Invention 

5 Outside of the field of database management, in the field of data communications, 

various methods have been used to ensure that data has been accurately received or 
retrieved from storage. One of these methods employs polynomial division to compute a 
cyclical redundancy check (CRC) value. The CRC value is used to verify that no errors 
have occurred in a block of data as a result of transmission or storage. According to this 

10 method, the polynomial representation of the data is divided, modulo 2, by a preselected 
generator polynomial. The resulting CRC value is then transmitted with the data. The 
recipient of the data divides the polynomial representation of the received communication 
by the same generator polynomial, and confirms that the newly calculated CRC check 
value is the same as the transmitted CRC value. In this way, the recipient can ascertain if 

1 5 the transmitted data has been corrupted. 

Cyclical redundancy check (CRC) computations are typically collision resistant 
(i.e., they do not usually produce the same check value for different blocks of data). As it 
turns out, this characteristic that makes CRC computations very good at detecting 
communication errors also allows them to be useful as a hashing function for generating a 

20 database index table, because they give an even distribution across the key space. 

According to the invention, then, a (CRC) computation is used as the hashing function to 
produce a database index table according to any conventional method. More preferably, 
two different types of CRC computations are performed on each key of a database to 
produce a database index table. 

25 The prior art recognizes at least three CRC generator polynomials as being 

particularly good at detecting differences in blocks of data. The first is the CRC-CCITT, 
set forth by the Comite Consultatif International de Telegraphique et Telephonique, 
defined as: 

X 16 + X 12 + X 5 + 1. 
30 The second is the CRC- 16, defined as: 
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X i6 + X 15 + X 2 +1. 
The third is the CRC-32, defined as: 

X 32 + X 26 + X 23 + X 22 + X 16 + X 12 + X 11 + X 10 + X 8 + X 7 + X 5 + X 4 + X 2 + X + 1. 
As noted above, each of these generator polynomials are particularly collision resistant. 
5 Moreover, table driven methods have been developed that allow the fast computation of 
CRC-CCITT, CRC-16, and CRC-32 values at a minimal cost in terms of processing time. 

The CRC-CCITT and CRC-16 computations produce a two-byte CRC value. 
(The CRC-32 computation, on the other hand, produces a four-byte value, as will be 
discussed further below.) Thus, combining the two two-byte CRC values generated by 

10 the CRC-CCITT and CRC-16 computations on a record's key generates a composite four- 
byte hash value, which represents a binary signature of the record's key. Because both 
the CRC-CCITT and CRC-16 computations give a very good distribution for any range of 
possible keys, and they can produce up to 2 32 possible values, the probability of two 
different keys have the same composite CRC-CCITT/CRC-16 hash value is very small 

15 In practice, the composite four-byte CRC-CCITT/CRC-16 hash value will almost always 
uniquely identify a record's key. According to more preferred embodiments of the 
invention, composite four-byte CRC-CCITT/CRC-16 hash values may be used to create a 
database index table according to any conventional method. 

The use of composite CRC-CCITT/CRC-16 four-byte hash values with 

20 conventional database index methods, however, may require large amounts of memory. 
For example, to employ a four-byte CRC-CCITT/CRC-16 hash value with the indirect 
accessing method requires a hash index table with 2 32 entries. If each address pointer 
(e.g., offsets to the record in the database) in the table were then a four-byte number, the 
size of the required index table would need to be 16 GB. 

25 Accordingly, other preferred embodiments of the invention may use a hybrid 

method for indexing the address pointers. This hybrid method includes both indirect 
addressing and linear searching. According to this method, the composite CRC- 
CCITT/CRC-16 four-byte hash value is divided up into its two two-byte values. The 
CRC-CCITT value is used for indirect addressing, as with a classic hash index table, 

30 while the CRC-16 value is then used as a "key" in a linear search. 
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More particularly, the address translation of the CRC-CCITT value of the key 
provides a "start" location in the index table. A linear search for an indexed CRC-16 
value matching the CRC-16 value of the key then begins from this start location. This 
linear search is efficient because the data to be searched is sequential and minimal in size. 
5 The four-byte offset for each record in the database is stored with the CRC- 1 6 value of its 
key. Thus, when an indexed CRC-16 value matching the CRC-16 value of the key is 
found, the associated four-byte offset is used to retrieve the record. The search key is 
then compared with the record's key, to confirm that they are the same. If the keys do not 
match, a collision has occurred (because two keys in the database have the same hash 

10 value) and the search of the index table continues until the correct record is retrieved. 
Due to the uniqueness of the four-byte hash value, however, collisions rarely occur. 

A database index table 101 that implements this hybrid method is shown in Fig. 1. 
The database indexed by this particular embodiment of this invention uses variable width 
database records to reduce the amount of memory space required, but fixed width records 

1 5 may alternately be employed as will be discussed below. The index table is initially 
formed by a plurality of fixed-size index table clusters 103. Using fixed-sized clusters 
permits easier record addition and deletion, but variable-sized clusters may alternately be 
used to reduce the amount of required memory space or to reduce overflow problems. 
The particular embodiment shown in Fig. 1 has 2 16 (i.e., 65,536) initial data 

20 clusters 103o to 10355535 (i.e., data clusters created to initially form the index table). 
Thus, the index conveniently has one initial data cluster 103 corresponding to each 
possible CRC-CCIT value. Other embodiments may employ fewer or greater numbers of 
initial data clusters 103, however. Further, as will be explained in detail below, 
additional overflow data clusters 103 may be added to the index table. 

25 Each index table cluster 103 shown in Fig. 1 contains an array of four entries 

105a-l 05d, but the number of entries K may be varied to optimize the performance of the 
index table 101, as will be explained in detail below. Each entry 105 in the index table 
has one two-byte CRC-16 field 107 and one four-byte record offset field 109. In this 
particular embodiment, unused entries have the CRC-16 field 107 set to 0 and the offset 
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field 109 set to its maximum value (MAXVALUE), but other values can alternately be 
employed. 

While the entries 105a- 105c of each index table cluster are available to store CRC 
values and record offsets, the last entry 105d of each index table cluster 103 is reserved 
5 for use as a link pointer. When all but the last entry 105d in an index table cluster 103 is 
filled, and a new record offset value needs to be added to the index table cluster, an 
overflow index table cluster 103 is created. The address of this overflow index table 
cluster 103 is then stored in the last entry 105d, and the new record offset value is stored 
in the first entry 105a of the overflow index table cluster. For example, as shown in Fig. 

10 1, when entries 105a- 105c of initial index table cluster 1 03 1 are filled, the overflow index 
table cluster 103j A is created, and the starting address of this overflow index table cluster 
103ia is stored in entry 105d of initial index table cluster 103 1 . As will be understood by 
those of ordinary skill in the art, if entries 105a- 105c of the overflow index table cluster 
103ia are filled, a second overflow index table cluster 103 iB (not shown) is created, and 

15 the starting address of this overflow index table cluster 103ib is stored in entry 105d of 
previous index table cluster 103ia- As will also be appreciated by those of ordinary skill 
in the art, additional overflow tables can be created and linked to full overflow tables as 
necessary. Thus, additional CRC- 16 values and their corresponding record offsets that 
cannot be stored in a full initial index table cluster 103 can be stored in an overflow index 

20 table cluster 103 directly or indirectly linked to the first index table cluster 103. 

The method for creating the index table 101 will now be described with reference 
to Figs. 2A and 2B. First, in step 201, a file is created with 2 i6 (i.e., 65,536) empty initial 
index table clusters 103. In other words, initial index table clusters 103o to 10365535 are 
preallocated, one for each possible two-byte CRC-CCITT value. Then, in step 203, the 

25 empty entries of the index table clusters are initialized to contain the value 0 in the CRC- 
16 field and the maximum value in the offset field. Of course, those of ordinary skill in 
the art will appreciate that the steps 201 and 203 may be combined into a single index 
table cluster creation step. 

Once the index table clusters have been generated and initialized, the first 

30 database record (or, if the process of creating the table has already been proceeding, the 
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next database record) and its offset value within the database are read from the database 
in step 205. The two-byte CRC-CCITT value for the record's key is then calculated in 
step 207, while the two-byte CRC-16 value for the record's key is calculated in step 209. 
Again, those of ordinary skill in the art will appreciate that the order of these steps may be 
5 reversed, or that these steps may be combined into a single step. 

In step 211, the CRC-CCITT value is transformed into an index file offset by the 
equation: 

Index(N) = N*K*B 
where N is the CRC-CCITT value of the record's key, K is the number of entries in an 

10 index table cluster, and B is the number of bytes per entry. Thus, Index(N) gives the 
address offset of the initial index table cluster 103 corresponding to the computed CRC- 
CCITT value N of the record's key. It should be noted that this formula for determining 
Index(N) is based upon that assumption that the smallest addressable memory unit is one 
byte, as is conventional. Other embodiments of the invention may alternately employ 

1 5 memories with any discretely addressable memory unit size, however. Thus, the formula 
above may be more generically written as 

Index(N) - N * K * L 

where L is the number of addressable memory locations within an entry of the index table 
cluster. 

20 Next, in step 213, each entry in this first index table cluster 103 is sequentially 

checked to see if the index table cluster 103 contains an available unused entry. If it does, 
then in step 215 the CRC-16 value and record address for the record are stored in the first 
available unused entry of this initial index table cluster 103, and the process continues to 
step 225. (As previously noted, while the last entry of an index table cluster may be 

25 unused, it is not available to store a CRC-16 value and record address.) 

If the first index table cluster 103 does not contain an available unused entry, then 
the last entry 105d is checked in step 217 to see if its offset field 109 contains an offset 
address for a linked overflow index table cluster. If it does, then the processes of steps 
213 through 217 are performed for this second (and any subsequent) index table cluster 

30 103. If, on the other hand, the last entry 105d of the first (or any subsequent) index table 
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cluster 103 is empty, then a new index table cluster 103 is created in step 219, and the 
offset address of this new index table cluster is stored in the offset field 109 of the last 
entry 105d of the previously searched index table cluster 103 in step 221. Then, in step 
223, the CRC-16 value and record address for the most-recently read record are stored in 
5 the first entry of this newly created index table cluster. Lastly, in step 225, it is 

determined whether the most-recently read record was the last record in the database. If it 
was, then the process of creating the index table 101 ends. If it was not the last record in 
the database, then the process returns to step 205 to continue the process with the next 
unread record in the database. Thus, the process of creating the index table 101 continues 

1 0 until all records of the database have been indexed. 

A method for using the index table 101 will now be described with reference to 
Fig. 3. When a user provides a search key, the two-byte CRC-CCITT value for the search 
key first is calculated in step 301, while the two-byte CRC-16 value for the search key is 
calculated in step 303. In step 305, the CRC-CCITT value is transformed into an index 

1 5 file offset by the equation: 

Index(N) = N*K*B 
where N is the CRC-CCITT value of the record's key, K is the number of entries in an 
index table cluster, and B is the number of bytes per entry. Then, in step 307, the CRC- 
1 6 field of each entry in the addressed index table cluster 1 03 is sequentially searched 

20 until an entry is found with a matching CRC-16 value. If it is determined in step 309 that 
an entry with a matching CRC-16 value has been found, then the corresponding record 
address in that entry is used to retrieve a record from the database in step 311. If an entry 
with a matching CRC-16 value is not found, then the offset address for an overflow index 
cluster table (stored in the offset record field 109 of the last entry 105d) is used to repeat 

25 the search in the overflow index table cluster. Steps 307, 309 and 3 13 are repeated until 
an entry with a matching CRC-16 value is found and the corresponding record retrieved 
in step 31 1, or until there are no further index table entries 105 to be searched. If there 
are no further index table entries 105 to be searched and a matching CRC-16 has not been 
found, then the process returns an error message (not shown) indicated that the desired 

30 record is not in the database. 
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On the other hand, once the record has been retrieved in step 311, its key is 
compared with the search key in step 3 15 to confirm that they are the same. If the search 
key and the retrieved record key are the same, then the retrieved record is provided to the 
user in step 317. If the search key and the retrieved record key are not the same, the 
5 process returns to repeat steps 307-3 1 5 until a record with a matching key is retrieved and 
provided to the user in step 317 or, alternately, until an error message is returned to the 
user. 

Thus, an index file according to some preferred embodiments of the invention is a 
hash table based on CRC-CCITT values. It consists of linked lists of fixed size index 

10 table clusters containing entries for each record's key's CRC-16 value and record offset. 
Since all entries within a linked list of index table clusters have the same computed CRC- 
CCITT value, only the CRC-16 values need to be compared once the appropriate initial 
index table cluster has been identified. If one matches the CRC-16 value computed for 
the search key, the corresponding record is retrieved from the database file using the 

1 5 record's offset stored with the CRC- 1 6 value. The search key is then compared to the key 
of the retrieved record. If the keys do not match, a double collision has occurred and the 
search continues at the next index table entry. 

As mentioned above, the size K of the index table clusters 103 can be adjusted to 
improve the performance of the search or to reduce disk space. If K is too large for the 

20 number of records in the database, the index file 101 consumes more disk space than is 
necessary because most index table clusters 1 03 are only partially used. If, on the other 
hand, K is too small, then many overflow index table clusters 103 will need to be created 
and linked. Because every link that is traversed requires a disk access to the index file 
101, performance degrades as the number of linked clusters 103 increases. To change K, 

25 the index file 101 must be regenerated, as all index table clusters are preferably the same 
size. 

To document the performance of the above-described embodiment of the 
invention, the following test results were obtained from a Sun Microsystems Enterprise 
E4500/E5500 computer with 2.0 GB of main memory and 4 CPUs, each operating at 400 
30 MHz. The operating system was SunOS version 5.6, and the tests were performed during 



11 



idle periods to minimize the impact of other users on the test results. A test database file 
was constructed on a local disk. The database contained 20,000,000 records of 100 bytes 
each. The resulting database file size was 2 GB. The record's key field was a 7-character 
hexadecimal string representation of the record number. The database records were 
5 sequentially ordered from "0000000" to "131 2cfF\ 

According to the embodiment of this invention described in detail above, an index 
file was created with the index table cluster size K set to 100. The resulting index file 
size was 141 MB, or approximately 7% of the database file. Creating the index file 
required about 45 minutes on this system, and the average processing time to add records 

10 to the index file was 7,407 records/sec. 

To compare methods, three test programs were evaluated. The first program, 
gethash, simulated the classic hash table approach. By ignoring the second CRC-16 hash 
value in the index file, it resolved collisions by retrieving records with the same CRC- 
CCITT value until it found the correct record. The second program, getrec, used the 

15 method according to this invention described in detail above. By comparing the second 
CRC-16 hash value, it virtually eliminated collisions. 

The last program, getdirect, represented the theoretical minimal solution. It did 
not use an index file. Instead, it transformed the search key into the record number and 
retrieved the record directly from the database file. This approach worked because the 

20 test database's key field encoded the record number as a 7-character hexadecimal string, 
and fixed length records were used. Thus, the offset used to retrieve a record was the 
integer equivalent of the search key, multiplied by 100. 

To simulate real conditions, the test programs generated random queries using the 
random() function. To ensure that each test run used a new series of random numbers, 

25 the random number generator was seeded with a new random number each time. The test 
programs performed random queries over the entire range of possible records, and were 
run several times to reach steady state conditions. The detailed usage times for the test 
programs were obtained by employing the timex command and are shown in Table 1 . 
Each of the above-discussed programs is listed in the attached Appendices A-H, which is 

30 incorporated entirely herein by reference. 
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gethash 


getrec 


getdirect 


Number of Queries 


10,000 


100,000 


100,000 


Real (seconds) 


700.12 


102.75 


95.68 


User (seconds) 


9.79 


2.78 


1.90 


Sys (seconds) 


55.92 


8.60 


4.18 


Collisions 


1,621,635 


268 


0 


Random Queries/sec 


14 


973 


1,045 


Cached Queries/sec 


186 


9,930 


20,408 



Table 1. Performance data for test programs. 

The last row of Table 1, Cached Queries/sec, were the test results produced when 
5 the random number generator was not seeded and the same series of "random" queries 
were repeated. The disk cache eliminated wait I/O time and the performance increased by 
a factor of 10 to 20. However, this does not reflect real conditions where the queries are 
expected to be of random nature. 

The test results show that the number of collisions largely determines 
10 performance. The getdirect program had no collisions. The getrec program had an 

average of 268 collisions per 100,000 queries. The gethash program had an average of 
162 collisions per single query. By virtually eliminating collisions, the getrec program 
approached the performance of the getdirect program while employing modest resources 
for the index file. Thus, this invention provides a method for fast and efficient record 
15 retrieval in large databases that has nearly the performance of the theoretical minimal 
solution. 

There are many possible embodiments of this invention that relate to how the 
index file and database files are organized. For the most part, these involve tradeoffs 
between ease of adding or modifying records in the database versus the amount of disk 

20 space required for the index file and database file. As will be appreciated by those of 
ordinary skill in the art, what is best for one database application may differ for another 
database application. For example, the database file may be organized with fixed width 
records or variable width records having defined special record and field terminators. 
Fixed width records make updating a record simple because it can be done in place. If a 

25 variable width record needs to be updated and its record length has grown, the record 
must be moved, usually to the end of the database file by appending the new record and 
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updating the record's index pointer to this new location. However, variable width records 
can save substantial disk space because records are not filled with unused data. 

Still other embodiments of this invention may use other CRC algorithms or other 
combinations of CRC algorithms. For example, one embodiment of the invention may 
5 use a single four-byte CRC-32 calculation instead of the combination CRC-CCITT/CRC- 
16 calculations in the above-described embodiment. For example, the lower order two 
bytes of a CRC-32 value for a record's key can be used as the index table offset value 
(i.e., to determine the appropriate index table cluster in place of the CRC-CCITT value 
described above). The higher order two bytes of the record's key's CRC-32 value may 

10 then be used to locate a database offset record within the appropriate index table cluster 
(i.e., in place of the CRC-16 value described above). This alternate embodiment may 
reduce the number of collisions by 50% over the previously described embodiment. 

With still another embodiment of the invention, a four-byte CRC-32 value may be 
used in place of the two-byte CRC-16 values in the first-described embodiment discussed 

15 above. While this alternate embodiment will require additional memory to accommodate 
the additional two bytes required per index table cluster entry 105, the use of four byte 
CRC-32 values instead of the two-byte CRC-16 values will further reduce the frequency 
of collisions. Of course, those of ordinary skill in the art will appreciate that still other 
CRC algorithms or combinations of CRC algorithms can be employed. 

20 Further, various embodiments of the invention may have other fields added to the 

index table to reduce collisions or aid in the retrieval of the database record. For 
example, each entry in the index cluster may include a field for a record's length. This 
may be useful where a database employs a variable width record system. 

A database according to the various embodiments of the invention described 

25 above can be implemented with any type of memory storage, such as memory provided 
by an integrated circuit (commonly referred to as "main memory"), a disk containing a 
magnetic, optical, or magneto-optical storage medium, a holographic storage memory, 
etc. While main memory provides very fast data retrieval, this storage method requires a 
very large amount of a relatively expensive storage medium to implement large databases. 

30 Further, the memory must typically be initialized each time that the computer storing the 
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memory is restarted. Therefore, large databases according to the invention may be more 
commonly implemented using a disk cache. (As is known in the art, a disk cache is a 
portion of main memory set aside to temporarily store information read from another 
storage medium, e.g., a storage disk, such as a magnetic, optical or opto-magneto storage 
5 disk.) If information is retrieved from the database at random, however, as the size of a 
database increases, the usefulness of a disk cache decreases significantly. The time 
typically required to access data on a disk is about 8 ms. Thus, on average, a single disk 
will only be able to make 120 random accesses per second. 

To increase throughput, a database according to the invention may be stored on 

10 different storage disks. That is, the information in the database can be distributed across 
different storage disks. Therefore, according to still another embodiment of the 
invention, a CRC value or a portion of a CRC value can be used to physically distribute 
the records in the database amongst a plurality of storage disks. 

With the previously discussed embodiments of the invention, a key for each 

15 record is converted into one or more CRC values (e.g., a CRC-16 value, a CRC-32 value, 
a CRC-CCITT value, or a combination of any of these CRC values) when the record is 
initially stored in the database. This CRC value (or values) also can be used to determine 
on which storage disk, out of a plurality of storage disks, the record will physically be 
stored. Thus, if the database 401 is to be divided between sixteen storage disks 403-433 

20 as shown in Fig. 4, four bits of a CRC value (or values) for a record's key can be used to 
determine on which of the sixteen storage disks the record will physically be stored. For 
example, if the first four bits of a CRC value for a records key are 0010 (i.e., 2), then the 
record may be stored on storage disk 407 (the third storage disk in the group). Similarly, 
if the first four bits of a CRC value for a records key are 1 1 10 (i.e., 14), then the record 

25 may be stored on storage disk 43 1 (the fifteenth storage disk in the group). Because the 
CRC algorithms typically give a very even distribution of CRC values over a key range, 
using CRC values to determine the physical distribution of records onto multiple storage 
disks will give an even distribution of the records. 

The present invention has been described above by way of specific exemplary 

30 embodiments, and the many features and advantages of the present invention are apparent 
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from the written description. Thus, it is intended that the appended claims cover all such 
features and advantages of the invention. Further, since numerous modifications and 
changes will readily occur to those skilled in the art, the specification is not intended to 
limit the invention to the exact construction and operation as illustrated and described. 

5 For example, the invention may include any one or more elements from the apparatus and 
methods described herein in any combination or subcombination. Accordingly, there are 
any number of alternative combinations for defining the invention, which incorporate one 
or more elements from the specification (including the drawings, claims, and summary of 
the invention) in any combinations or subcombinations. Hence, all suitable modifications 

10 and equivalents may be considered as falling within the scope of the appended claims. 
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What is claimed is: 



L A method of cataloging information in a database, comprising: 
determining a key for referencing a record of information stored in a database; 
5 determining a record address for the record in the database; 

determining a cyclical redundancy check value for the key; and 
storing the record address in an index at a position corresponding to at least a 
portion of the cyclical redundancy check value. 

10 2. The method of cataloging information recited in claim 1 ? wherein the cyclical 

redundancy check value is a CRC-CCITT cyclical redundancy check value. 

3. The method of cataloging information recited in claim 1, wherein the cyclical 
redundancy check value is a CRC-16 cyclical redundancy check value. 

15 

4. The method of cataloging information recited in claim 1, wherein the cyclical 
redundancy check value is a CRC-32 cyclical redundancy check value. 

5. The method of cataloging information recited in claim 1, further including 
20 determining a second cyclical redundancy check value for the key; and 

storing the record on one of a plurality of storage devices based upon at least a 
portion of the second cyclical redundancy check value. 

6. The method of cataloging information recited in claim 5, wherein the at least a 
25 portion of the second cyclical redundancy check value is the same as the at least a portion 

of the first cyclical redundancy check value. 

7. The method of cataloging information recited in claim 1, further including 
determining a second cyclical redundancy check value for the key different from 

30 the first cyclical redundancy check value; and 
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storing at least a portion of the second cyclical redundancy check value in the 
index with the record address. 

8. The method of cataloging information recited in claim 7, wherein the second 
5 cyclical redundancy check value is a CRC-CCITT cyclical redundancy check value. 

9. The method of cataloging information recited in claim 7, wherein the second 
cyclical redundancy check value is a CRC-32 cyclical redundancy check value. 

10 10. The method of cataloging information recited in claim 7, wherein the second 

cyclical redundancy check value is a CRC-16 cyclical redundancy check value. 

1 1 . The method of cataloging information recited in claim 7, further including 
determining a third cyclical redundancy check value for the key; and 

1 5 storing the record on one of a plurality of storage devices based upon at least a 

portion of the third cyclical redundancy check value. 

12. The method of cataloging information recited in claim 11, wherein the third 
cyclical redundancy check value is the same as the first cyclical redundancy check value. 

20 

13. The method of cataloging information recited in claim 11, wherein the third 
cyclical redundancy check value is the same as the second cyclical redundancy check 
value. 

25 14. The method of cataloging information recited in claim 1, further including 

storing at least a second portion of the cyclical redundancy check value in the index with 
the record address. 

15. The method of cataloging information recited in claim 14, wherein the 
30 cyclical redundancy check value is a CRC-CCITT cyclical redundancy check value. 
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16. The method of cataloging information recited in claim 14, wherein the 
cyclical redundancy check value is a CRC-32 cyclical redundancy check value. 

5 17. The method of cataloging information recited in claim 14, wherein the second 

cyclical redundancy check value is a CRC-16 cyclical redundancy check value. 

18. The method of cataloging information recited in claim 1, 

wherein the index is divided into index table clusters, each index table cluster 
1 0 having K number of entries with each entry having L number of locations; and 
further including 

sequentially checking a status of each entry in an index table cluster 
corresponding to the at least a portion of the cyclical redundancy check value until 
a first unused entry available to store the record address is recognized, and 
1 5 storing the record address in the recognized first available unused entry. 

19. The method of cataloging information recited in claim 18, further including 
if an unused entry available to store the record address is not recognized from 

sequentially checking a status of each entry in the index table cluster corresponding to the 
20 at least a portion of the cyclical redundancy check value, then 

creating a second index table cluster corresponding to the at least a portion 
of the cyclical redundancy check value in the index; 

storing an address of the second index table cluster in the first index table 

cluster; and 

25 storing the record address in a first available unused entry of the second 

index table cluster. 

20. The method of cataloging information recited in claim 18, wherein an initial 
location of the index table cluster corresponding to the at least a portion of the cyclical 
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redundancy check value is positioned at an offset Index(N) = N * K * L, where N is the at 
least a portion of the cyclical redundancy check value. 

21. A method of obtaining a record of information from a database, comprising: 
5 determining a key for referencing the record in the database; 

determining a cyclical redundancy check value for the key; 
determining a position in an index corresponding to at least a portion of the 
calculated cyclical redundancy check value; 

retrieving an address for the record from the determined position in the index; and 
10 obtaining the record from the database using the retrieved record address. 

22. The method of obtaining a record of information from a database recited in 
claim 21, wherein the cyclical redundancy check value is a CRC-CCITT cyclical 
redundancy check value. 

15 

23. The method of obtaining a record of information from a database recited in 
claim 21, wherein the cyclical redundancy check value is a CRC-16 cyclical redundancy 
check value. 

20 24. The method of obtaining a record of information from a database recited in 

claim 21, wherein the cyclical redundancy check value is a CRC-32 cyclical redundancy 
check value. 

25. The method of obtaining a record of information from a database recited in 
25 claim 2 1 , further including: 

determining a second cyclical redundancy check value for the key; 

sequentially retrieving a stored at least a portion of a cyclical redundancy check 
value from each of one or more index entries at the determined position in the index; 

comparing each retrieved stored at least a portion of cyclical redundancy check 
30 . value with at least a portion of the second calculated cyclical redundancy check value 
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until a stored at least a portion of a cyclical redundancy check value is determined to be 
identical to the at least a portion of the second calculated cyclical redundancy check 
value; and 

retrieving the address for the record from an index entry at the determined 
5 position in the index having the stored at least a portion of a cyclical redundancy check 
value determined to be identical to the at least a portion of the second calculated cyclical 
redundancy check value. 

26. The method of obtaining a record of information from a database recited in 
1 0 claim 2 1 , further including : 

sequentially retrieving a stored at least a second portion of a cyclical redundancy 
check value from each of one or more index entries at the determined position in the 
index; 

comparing each retrieved stored at least a portion of a cyclical redundancy check 
15 value with a second portion of the calculated cyclical redundancy check value until a 

stored at least a portion of a cyclical redundancy check value is determined to be identical 
to the second portion of the cyclical redundancy check value; and 

retrieving the address for the record from an index entry at the determined 
position in the index having the stored at least a portion of cyclical redundancy check 
20 value determined to be identical to the second portion of the cyclical redundancy check 
value. 

27. The method of obtaining a record of information from a database recited in 
claim 21, wherein 

25 the index is divided into index table clusters, each index table cluster having K 

number of index entries with each index entry having L number of locations; and 

an initial location of an index table cluster corresponding to the at least a portion 
of the cyclical redundancy check value is positioned at an offset Index(N) = N * K * L, 
where N is the at least a portion of the cyclical redundancy check value. 

30 



21 



28. A computer-readable medium having stored thereon a data structure, 
comprising: 

a first data field having an address for a record in a database, such that a position 
of the first data field in the data structure corresponds to at least a portion of a cyclical 
5 redundancy check value for a key of the record. 

29. The computer-readable medium of claim 28, further including: 

a second data field having a second portion of the cyclical redundancy check 

value. 

10 

30. The computer-readable medium of claim 28, further including: 

a second data field having at least a portion of a second cyclical redundancy check 
value for the key different from the first cyclical redundancy check value for the key. 



15 31. The computer-readable medium of claim 28, wherein the cyclical redundancy 

check value is a CRC-CCITT cyclical redundancy check value. 

32. The computer-readable medium of claim 28, wherein the cyclical redundancy 
check value is a CRC-16 cyclical redundancy check value. 

20 

33. The computer-readable medium of claim 28, wherein the cyclical redundancy 
check value is a CRC-32 cyclical redundancy check value. 



34. The computer-readable medium of claim 28, wherein 
25 the data structure is divided into index table clusters, each index table cluster 

having K number of index entries with each index entry having L number of locations; 
and 

an initial location of an index table cluster corresponding to the at least a portion 
of the cyclical redundancy check value is positioned at an offset Index(H) = N * K * L, 
30 where N is the at least a portion of the cyclical redundancy check value 
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ABSTRACT 

A method for fast and efficient record retrieval in large databases using cyclical 
redundancy check (CRC) computations as hash functions. Two hash values are computed 
for each record's key using the CRC-CCITT and CRC- 16 generator polynomials. The 
5 two CRC values then are combined into a four-byte composite hash value that represents 
a binary signature of the record's key. Alternately, a single CRC-32 value can be used as 
a four-byte hash value. In most cases, this four-byte hash value uniquely identifies the 
record's key. An index file is constructed using a hybrid search method, part hash table 
and part linear search. The index file is searched to find a match for the four-byte hash 
10 value and the record's offset is obtained. The record's offset is used to retrieve the record 
from the database. 
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Figure 1. Organization of the Index File. 
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This directory includes the source files used to 

build test programs supporting the patent application, 

"Method for fast and efficient record retrieval in large databases." 

fastdb.mk - make file to build programs, 
fastdb.h - header file for programs. 
mkdata.C - program to generate test Data file 
mkindx.C - program to create index file (InsertRecord) 
getrec.C - main test program to retieve records 
query. C - query record routines (3 methods) 

crc.C - table driven crc routines for CRC-CCITT and CRC-16 

To bui Id programs : 

$ make -f fastdb.mk (uses g++ to compile) 

To build an Data file with 1,000,000 records: 

$ mkdata 1000000 > Data 
To build the Index file: 

$ mkindx Data 
To test 10 , 000 random queries of 1,000,000 records: 

$ timex getrec -s -1 10000 -r 1000000 
Or to query a specific record: 

$ getrec 0001000 

The "gethash" and "getdirect 11 commands are built. These work 
the same as "getrec", but implement different search methods 
that were used in the test results of the patent application. 

The test programs make a few assumptions: 

1) The Data file is named "Data" and fields are <tab> separated. 

2) The index file is named "keyindx" . 

3) The first field of "Data" file is the indexed "key" field. 

4) The second field of "Data" file is decimal version of the first field. 
This is used as a validation in getrec to verify that the correct 
record was retrieved. 

5) The parsing of Data record is left to application. 

6) The Data schema file (for named fields) is not supported. 

7) The Index file was created on same machine as getrec is run, ie, 

no provisions are make for machine dependant byte order of integers . 

8) The CRCBUCKETSIZE (number of entries in an index bucket) should 

be tunable and stored in the index file (or somewhere) . Currently, 
it is a #define in fastdb.h 

9) The index field is assumed to be unique, the query function 
finds the first match and returns, it does not look for multiple 
records that match. 
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# 

# Makefile to build test programs in patent application 
# 

CRCSRC = crc.C 
MAINSRC = getrec.C 

CRCOB J = $ ( CRCSRC : . C= . o ) 
MAINOBJ = $ (MAINSRC : . C= . o) 

# Solaris compiler 
#CFLAGS = -02 
#CPP - CC 

#CC - cc 

# gnu compiler 
C FLAGS = -02 
CPP = g++ 

CC = gcc 

# Standard rules for making C. . . 
.Co : 

$(CPP) $(CFLAGS) -c $< 

.CO : 

$(CC) $(CFLAGS) -c $< 

# steps 

product : $ (MAINOBJ) $ (CRCOB J) 

$(CPP) $(CFLAGS) -o mkdata mkdata.C 

$(CPP) $(CFLAGS) -o mkindx mkindx.C $ (CRCOB J) 

$(CPP) $(CFLAGS) -o getrec $ (MAINOBJ) query. C $ (CRCOB J) 

$(CPP) $(CFLAGS) -o gethash -DSIMPLE_HASH $ (MAINOBJ) query. C $ (CRCOBJ) 

$(CPP) $(CFLAGS) -o getdirect -DGET_DIRECT $ (MAINOBJ) query. C 

clean: 

rm ~f * .o 



o 

* APPENDIX C 

a 

H 

m 
m 

o 

m 

u 
O. 



File: fastdb.h 



Page 



#ifndef ATT_fastdb_h 
#define ATT_fastdb_h 

// 

// Common header file for fast index file based on CRC signatures. 
J I 

II 

II CRC compute routines 
// 

#define ushort unsigned short 

void crcstr ( register ushort *accum, register unsigned char *str ) ; 
void crclSstr { register ushort *accum f register unsigned char *str ) ; 



// 

// Name: keyindx_t 
// 

// Description: Data structure of primary index file. Contains the 
// the CRC- 16 checksum for this "key" and the record 

// "offset" into the Data file. 

// 

// Note: pragma pack (2) was removed because SUN compilier does not 
// support it properly {crashes when "long" not on 4 byte boundary) 

// So to line up on 4 -byte boundary, I added "short" reclen to 

// data structure, increasing index entry size to 8 bytes. 

// 

// The test results in the patent application were done 

// with pragma pack (2) enabled on GNU compiler and without 

// the reclen field. 

// 

//#pragma pack (2) // pack on 2 byte boundary 

typedef struct { 

unsigned short crcl6; // CRC- 16 checksum for key 
unsigned short reclen; // length of record 
unsigned long offset; // record index in Data file 

} keyindx__t ; 

//ftpragma packO // restore default packing 

// Number of entries in CRC-index file, 2**16 
#define CRCTABLESIZE 65536 

// Number of entries in CRC-bucket 

// This should be a tunable parameter based on the 
// the number of records in the Data base. 
#define CRCBUCKETSIZE 100 

// Maximum offset for 32 -bit file, marks unused entries on CRC-index table 
#define MAXOFFSET Oxffffffff 



// Maximum record length in Data file 
#define MAXRECORD 128 

#endif /* #ifndef ATT_fastdb_h */ 
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#include 


<stdio .h> 


# include 


<unistd.h> 


# include 


<errno .h> 


# include 


<stdlib.h> 


# include 


<sys/ types .h> 


# include 


<sys/ stat .h> 


#include 


<f cntl .h> 


# include 


<string.h> 


II 





// Creates test Data file with specified number of records 

// format of Data file is: 

// 

// <7 Hexdigitxtabxdecimal stringxnewline> 
// 

// Example (16th record) : 000010 16 
// 

// Query basically converts Hex (key) into Decimal via Data lookup. 

// 

char stuf f [] =" this is a test data file that contains about 100 bytes/record line. 
Is for test 

main ( int argc, char *argv[] ) 

{ 

unsigned int i ; 
unsigned int cnt; 

if ( argc != 2 || (cnt = (unsigned int)atoi- ( argv[l] )) < 1 ) 

{ 

print f ( "usage: %s <count>\n' : , argv [ 0 ] ) ; 
exit (1) ; 

} 

for ( i = 0; i < cnt; i++ ) 

printf ( "%7.7x\t%8.8d\t%82 .82s\n" , i, i, stuff ); 
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# include 


<stdio .h> 


#incluae 


<unistd . h> 


# include 


<errno .h> 


#include 


<staiiJD . n> 


# include 


<sys/ types . h> 


# include 


<sys/ stat . h> 


#include 


<f cntl .h> 


# include 


<string . h> 


#include 


" fastdb.h" 



int InsertRecordlndex ( int keyfd, char *key, unsigned long offset ) 

keyindxj: newbucket [CRCBUCKETSIZE+1] ; 

// 

// Name: mkindx 
// 

// Description: 

// Program to create double hash CRC index files from 

// a Data file. This index file is used by getrec 

// for testing QueryRecord ( ) performance. 

// 

// Creates "keyindx", a double hash CRC index file. 

// 

main( int argc, char *argv[] ) 
{ 

char buf [4096] ; 
int bytes; 
int lines; 
FILE *fdata; 
char *ptr; 
int keyfd; 
int reel en ; 
int i ; 

unsigned long offset; 
keyindx_t key; 

if ( argc != 2 || *argv[l] == ) 
{ 

printf ( "Usage: %s <f ilename>\n" , argv[0] ); 

exit (1) ; 

} 

// initialize CRC bucket 

for { i - 0; i < CRCBUCKETSIZE + 1; i++ ) 
{ 

newbucket [i] .crcl 6 = 0; 
newbucket [i] .offset = MAXOFFSET; 
} 

// open Data file to build index from. . . 
fdata = fopen ( argv[l] 7 "r" ); 
if ( ! fdata ) 
{ 

printf ( "Can't open %s for read* %s\n", 

argv[l] 7 strerror ( errno ) ); 
exit (1) ; 
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} 

// preallocate 2**16 keyindx file buckets in key index file 
keyfd = open ( "keyindx", 0_RDWR | 0__CREAT | OJTRUNC, 0640 ); 
if ( ! keyfd ) 

{ 

printf ( "Can't open keyindx for write. %s\n lf , 

strerror ( errno ) ) ; 
exit (1) ; 

} 

// initialize CRC index file 
for ( i = 0; i < CRCTABLESIZE; i++ ) 
{ 

if ( write ( keyfd, (char *)newbucket / sizeof (newbucket ) ) 
i = sizeof (newbucket) ) 

{ 

printf ( "write to keyindx file failed\n" ) ; 
exit (1) ; 

} 

} 

lines = 0; 
offset = 0; 

for ( offset =0; ; offset += (unsigned long) reclen ) 

{ 

/ / get next record 

ptr = fgets ( buf, sizeof (buf), fdata ); 
if ( Iptr ) 
break; 

// save this record size for offset update 
reclen = strlen ( buf ) ; 

// zap newline 

ptr = strchr ( buf, ' \n' ); 

if ( ptr ) 

*ptr = 0; 

// get email address, just happens to be first field, 
ptr = strchr ( buf, ' \t ' ); 
if ( ptr ) 

*ptr = 0; 

// Add record to index file.., 
InsertRecordlndex ( keyfd, buf, offset ) ; 
lines ++; 

} 

close ( keyfd ) ; 



// 

// InsertRecordlndex adds an index entry for this record's offset. 
// 

int InsertRecordlndex ( int keyfd, char *key, unsigned long offset ) 



File: mkindx.C 



Page 3 



int i ; 

unsigned long startoff; 
unsigned long curpos; 
unsigned short crc; 
unsigned short crcl6; 

keyindxj: crcbucket [CRCBUCKETSIZE+1] ; 
keyindx_t *keyptr; 

// calculate CRC-CCITT checksum 
crc = 0 ; 

crcstr ( &crc, (unsigned char *) key ) ; 

// calculate CRC-16 checksum 
crcl6 - 0 ; 

crclSstr ( &crcl6, (unsigned char *) key ) ; 
// find right bucket 

startoff = (unsigned long) crc * (CRCBUCKETSIZE + 1) * sizeof (keyindx__t) ; 

while ( 1 ) 
{ 

// seek to this bucket's offset 

curpos = Iseek ( keyfd, startoff, SEEK_SET ) ; 

if ( curpos ! = startoff ) 

{ 

printf ( "lseek in keyindx file failed\n" ) ; 

exit (1) ; 

} 

// read the bucket 

if ( read ( keyfd, (char *) crcbucket, sizeof (crcbucket ) ) 
!= sizeof (crcbucket) ) 

{ 

printf ( "read from keyindx file failed\n n ) ; 
exit (1) ; 

} 

// search for empty slot 
keyptr = crcbucket; 

for ( i = 0; i < CRCBUCKETSIZE ; i++, keyptr++ ) 
{ 

// check for empty slot in bucket 

if ( keyptr- >of f set == MAXOFFSET && keyptr- >crc!6 == 0 ) 

{ 

// found empty slot, use it 
keyptr->of f set = offset; 
keyptr- >crcl6 = crcl6; 

lseek ( keyfd, startoff + (i * sizeof (keyindx__t) ) , SEEK_SET ) ; 
if ( write { keyfd, (char *) keyptr, sizeof (keyindx_t) ) 
!= sizeof (keyindx_t) ) 

{ 

printf ( "write entry to keyindx file failed\n" ) ; 
exit (1) ; 

} 

return ( 1 ) ; 

} 
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// check for possible duplicate record 
if ( keyptr->crcl6 == crcl6 ) 

printf ("possible duplicate crc = Ox%X crcl6 = Ox%X key = %s\n" , 
crc, crcl6, key ) ; 

} 

//no empty slot found, check if next crcbucket is linked, 
if ( keyptr->offset == MAXOFFSET ) 
, break; //no more buckets 

// follow link to next bucket... 
startoff = keyptr->of f set ; 

//printf ( "Linking to next bucket at Ox%x\n" , startoff ); 
} // while ( 1 ) 

// ran out of space, need to allocated and link new bucket 
//printf ( "Allocating new bucket for crc = Ox%x\n n / crc ); 
keyptr->of f set = lseek ( keyfd, 0, SEEK_END ); 

// initialize new CRC bucket 
newbucket [0] . of f set = offset; 
newbucket [0] . crcl 6 = crclG; 

/ / write new bucket to end of key index file 
if ( write ( keyfd, (char *) newbucket, sizeof (newbucket ) ) 
!= sizeof (newbucket ) ) 

{ 

printf ( IT write bucket to end of keyindx file failed\n" ) ; 
exit (1) ; 

} 

// update previous bucket's next pointer to this new bucket's offset 
lseek ( keyfd, startoff + (i * sizeof (keyindx__t) ) , SEEK_SET ); 
if ( write ( keyfd, (char *)keyptr, sizeof (keyindx_t) ) 
!= sizeof (keyindx_t) ) 

{ 

printf ( "write entry to keyindx file failed. %s\n", 

strerror ( errno ) ) ; 
exit (1) ; 

} 



return ( 0 ) ; 

} 
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# include 


<stdio ,h> 


# include 


<unistd . h> 


# include 


< err no . h> 


# include 


<stdlib.h> 


# include 


<sys/types ,h> 


ft include 


**» Q Q / OLuL . i. 1 .- > 


ftinclude 


<f cntl .h> 


# include 


<string.h> 


$ include 


"fastdb h" 


extern int 


FalseDigs / 


extern int 


LinksFollowed 


extern "C" 


long random () ; 



// fastdb query function 

int QueryRecord ( char *key, char *record ) ; 
void usage ( char *progname ) 

{ 

printf ( 

"Usage: %s [-s] [-1 <loop count>] [-r <random key max>] <key(s) . ..>\n" , 
progname ) ; 

printf { " [-s] - seed random number generator\n" ) ; 

printf ( " [-1 <loop count >] - number of queries to run\n" ) - 

printf ( " [-r <random key max>] - max number of records in Data\n" ) ; 
exit (1) ; 

} 

// 

// test program 
// 

// The -s option will seed the random number generator to produce 

// different set of random numbers evert ime it is invoked. 

// 

// The -1 option will cause the query program to loop the specified 

// number of times , over the keys specified. 

// 

// The -r option will generate random 7 Hex digits keys designed to 

// work the the Data file created by "mkdata" . For example: 

// 

// getrec -1 10000 -r 1000000 
// 

// will query 10,000 time using random keys from 0 to 1,000,000 
// in 7 Hex digit format. It also checks the answer from the 
// record found. 
// 

main( int argc, char *argv[] ) 
{ 

char record [MAXRECORD] ; 
char *key; 
int i; 

int loop = 1; 
int rmax = 0; 
long rval ; 
long rmask; 
char rkey[10] ; 
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int seed; 
int value ; 
char *ptr; 

while ( (i = getopt(argc, argv, "?sl:r:")) != -1) 
switch (i) 

{ 

case ' s ' : 

seed = (unsigned int) time(O) * getpidO; 

seed = seed & 0x3fff; 

printf ( "seed = %d\n H , seed ); 

s random ( seed ) ; 

break ; 

case ' 1 1 : 

loop = atoi (optarg) ; 
if {loop < 1) loop = 1; 
break ; 

case ' r ' : 

rmax = atoi (optarg) ; 
if (rmax < 0) rmax = 0 ; 
break ; 

default : 

usage ( argv [ 0 ] ) ; 

} 

// how many keys are there? 
int keycount = argc - optind; 
if (keycount < 1 && ! rmax ) 

usage (argv [0] ) ; 
else if (keycount > 0 && rmax ) 

usage (argv [0] ) ; 
else if ( rmax ) 

{ 

// use one key 
keycount = 1 ; 

// compute the minimum bit mask for max random value 

for ( rmask = 0x3fffffff; rmask >= rmax; rmask = rmask >> 1 ) 

/* keep shifting */ ; 
rmask = (rmask * 2) + 1; // one too many 
printf ( "using rmask = 0x%8.8x\n", rmask ); 

} 

while ( loop-- ) 

{ 

// find each key in Data base 
for (i = 0 ; i < keycount; i++) 

{ 

if ( irmax ) 

-key = argv [optind + i] ; 
else 

{ 

while ( (rval = randomO & rmask) >= rmax ) 
/* keep looking*/ ; 
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sprintf { rkey, "%7.7x", rval ); 
key = rkey; 

} 

if ( Query-Record ( key, record ) ) 
{ 

if ( rmax ) 

{ 

// check record for right value 
ptr = strchr ( record, ' \t ' ) ; 
if ( !ptr ) 
{ 

printf { "No tab in: %s\n", 

record ) ; 
exit (1) ; 

} 

sscanf ( ptr, "%d" , lvalue ); 
if ( rval != value ) 
{ 

printf { "Bad record: %s\n n , 

record ) ; 
exit (1) ; 
} 

} 

if ( !loop ) 

printf ( "Found: %s\n" , record ); 

} 

else 

{ 

printf ("Record '%s' not found. \n" , key ) ; 
} 

} 

} 

printf ( "FalseDigs = %d, LinksFollowed = %d\n", FalseDigs, LinksFollowed ) 
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#include 
# include 
#include 
#include 
# include 
#include 
#include 
#include 
#include 



<stdio ,h> 
<unistd.h> 
<errno .h> 
<stdlib .h> 
<sys/types ,h> 
<sys/ stat .h> 
<f cntl .h> 
<string. h> 
"fastdb.h" 



int FalseDigs = 0 ; 
int LinksFollowed = 0; 

// 

// Name: QueryRecord 
// 

// Description: 

// Query record routine using double CRC hash index files 
// 

// If SIMPLE_HASH is defined, routine simulates a simple hash 

// search by ignoring the second hash value for test comparision. 

// 

// If GET_DIRECT is defined, routine simulates a theoritical 

// optimal solution by translating search key into record offset 

// via compulatation and seeks directly to get the record 

// 



#ifndef GET DIRECT 



int QueryRecord ( char *key, char *record ) 
{ 

int i ; 

unsigned short crc; 
unsigned short crcl6; 
unsigned long startoff; 
unsigned long curpos; 
int bytes; 
char *ptr; 

keyindx_t crcbucket [CRCBUCKETSIZE+1] ; 
key i ndx_ t * keyp t r ; 
static int keyfd = -1; 
static int datafd = -1; 

// first time file open of keyindx and data file 
if ( keyfd == -1 ) 
{ 

// open the key index file 
keyfd = open ( "keyindx", 0_RDONLY ); 
if ( keyfd -1 ) 
{ 

printf ( "Can't open keyindx for write* %s\n", 

strerror { errno ) ) ; 
exit (1) ; 

} 

// open the Data file 

datafd = open ( "Data", 0_RDONLY ) ; 
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if ( datafd == -l) 
{ 

printf ( "Can't open Data for read. %s\n" , 

strerror ( errno ) ) ; 
close { keyf d ) ; 
keyfd = -1; 
exit (1) ; 
} 

} 

// calculate CRC-CCITT checksum 

crcstr ( &crc, (unsigned char *) key ) ; 

#ifndef SIMPLEjHASH 

// calculate CRC-16 checksum 

crcl6str ( &crcl6, (unsigned char *) key ) ; 
#endif 

// find right bucket 

startoff = (unsigned long) crc * (CRCBUCKETSIZE + 1) * sizeof (keyindx_t) ; 

while ( 1 ) 

{ 

// seek to this bucket's offset 
curpos = lseek ( keyfd, startoff, SEEK_SET ); 
if ( curpos ! = startoff ) 
{ 

printf ( "lseek in keyindx file failed\n" ) ; 
exit (1) ; 

} 

// read the bucket 

if ( read ( keyfd, (char *) crcbucket, sizeof (crcbucket) ) 
!= sizeof (crcbucket) ) 
{ 

printf ( "read from keyindx file failed\n M ) ; 
exit (1) ; 

} 

// search for crcl6 match 
keyptr = crcbucket ; 

for ( i = 0; i < CRCBUCKETSIZE ; keyptr++ ) 

SIMPLE__HASH 

// check for crcl6 match slot in bucket 

if ( keyptr->crcl6 == crcl6 && keyptr- >off set != MAXOFFSET ) 

// simulating simple hash, ignoring crcl6 value... 
if ( keyptr- >of f set 1= MAXOFFSET ) 

{ 

// found match, read record for key verification... 
curpos = lseek { datafd, keyptr->of f set , SEEK_SET ); 
if ( curpos != keyptr- >off set ) 
{ 

printf ( "lseek in Data file failed\n" ) ; 
exit (1) ; 
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} 

bytes = read ( datafd, record, MAXRECORD - 1 ) ; 
if ( bytes < 2 ) 

{ 

printf ( "bad read, bytes = %d\n" , 

bytes ) ; 
exit (1) ; 

} 

record [bytes] = 0; 
ptr = strchr ( record, ' \n* ); 
if { Iptr ) 
{ 

printf { "error, no new line found\n" ) ; 
exit (1) ; 

} 

*ptr = 0; 

// get first field, e-mail key is first 
ptr = strchr ( record, 1 \t ' ); 
if ( Iptr ) 

{ 

printf ( "bad read, no tab found\n" ) ; 
exit (1) ; 

} . • 

// check record (key is first record) 
*ptr = 0; 

if ( strcmp ( record, key ) = = 0 ) 

{ 

/ / found record 
*ptr = ' \t ' ; 
return 1 ; 

} 

else 

{ 

//printf ( "false dig, found: 1 %s"\n", 

//record ) ; 
FalseDigs++; 

} 

} 

} 



// no macth found in this bucket, check if next crcbucket is linked, 
if { keyptr->of f set == MAXOFFSET ) 

break; // no more buckets, record not found 

// follow link to next bucket,., 
startoff = keyptr->of f set; 
LinksFollowed++ ; 

//printf ( "Linking to next bucket at 0x%x\n" , startoff ); 
} // while ( 1 ) 



// record not found 

printf ( "record not found, key = 

return ( 0 ) ; 



%s\n", key ) ; 
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} 

#else 

// 

// Name: QueryRecord 

// 

// Description: 

// Query record routine using theoritical optimal solution. 

// Does not use index file, rather it transform the search 

// key into record offset via computation. Basically, 

// provides baseline performance for random seeks/reads 

// on a large data file. 

// 

int QueryRecord ( char *key, char * record ) 
{ 

int i ; 

unsigned long startoff; 
unsigned long curpos; 
unsigned int r index; 
int bytes ; 
char *ptr; 

static int datafd = -1; 

// first time file open of keyindx and data file 
if ( datafd == -1 ) 
{ 

// open the Data file 

datafd = open { "Data", 0_RDONLY ); 

if ( datafd == -1) 

{ 

printf ( "Can't open Data for read. %s\n", 

strerror ( errno ) ) ; 
exit (1) ; 

} 

} 



// translate key into record number 
sscanf ( key, "Ix", &rindex ); 
startoff = (unsigned long) r index * 100; 

// seek directly to record and read it 

curpos = lseek ( datafd, startoff, SEEKjSET ) ; 

if ( curpos != startoff ) 

{ 

printf ( "lseek in Data file failed\n" ) ; 
exit (1) ; 

} 

bytes = read ( datafd, record, MAXRECORD - 1 ) ; 
if ( bytes < 2 ) 

{ 

printf ( "bad read, bytes = %d\n" , 

bytes ) ; 
exit (1) ; 

} 
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record [bytes] = 0; 
ptr = strchr ( record, 1 \n' ); 
if ( Iptr ) 
{ 

printf ( "error, no new line found\n" ) ; 

exit (1) ; 

} 

*ptr = 0; 

// 'get first field, e-mail key is first 
ptr = strchr ( record, ' \t r ) ; 
if ( Iptr ) 

{ 

printf ( "bad read, no tab found\n" ) ; 
exit (1) ; 

} 

// check record (key is first record) 
*ptr = 0; 

if ( strcmp ( record, key ) == 0 ) 

{ 

/ / found record 
*ptr = '\t' ; 
return 1; 

} 

else 

{ 

printf ( "false dig, found: ' %s'\n", 

record ) ; 
FalseDigs++ ; 

} 



// record not found 

printf ( "record not found, key = %s\n" / key ) ; 
return ( 0 ) ; 

} 

#endif // GET_DIRECT solution 
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#define ushort unsigned short 

// 

// table for CRC-CCITT polynomial, eg X**16 + X**12 + X**5 + 1 (0x1021) 
// 

static const ushort crctbl[256] = 



{ 



0x0000, 


0x1021, 


0x2042, 


0x3063, 


0x4084, 


0x5 OAS, 


0x60C6, 


0x70E7, 


0x8108, 


0x9129, 


0xA14A, 


0xB16B, 


0xC18C, 


OxDIAD, 


OxElCE, 


OxFlEF, 


0x1231, 


0x0210, 


0x3273, 


0x2252, 


0x52B5, 


0x4294, 


0x72F7, 


Ox62D6, 


0x9339, 


0x8318, 


0xB37B, 


0XA35A, 


0XD3BD, 


0XC39C, 


0xF3FF, 


0XE3DE, 


0x2462, 


0x3443, 


0x0420, 


0x1401, 


0x64E6, 


0x74C7, 


0x4 4 A4, 


0x5485, 


0xA56A, 


0xB54B, 


0x8528, 


0x9509, 


OxESEE, 


0xF5CF, 


OxCSAC, 


OxD58D, 


0x3653, 


0x2 672, 


0x1611, 


0x0630, 


0x76D7, 


0x66F6, 


0x5695, 


Ox46B4, 


0xB75B, 


0xA77A, 


0x9719, 


0x8738, 


0xF7DF, 


0xE7FE, 


0xD79D, 


OxC7BC, 


0x4 8 C4, 


0x5 8E5, 


0x6886, 


0x7 8 A7, 


0x0840, 


0x1861, 


0x2802, 


0x3823, 


0xC9CC, 


0xD9ED, 


0xE98E, 


.0xF9AF, 


0x8948, 


0x9969, 


0xA90A, 


0xB92B, 


0x5AF5 , 


0x4AD4 , 


0x7AB7, 


0x6 A9 6, 


0xlA71, 


OxOASO, 


0x3 A3 3, 


0x2 Al 2 , 


OxDBFD, 


OxCBDC, 


OxFBBF, 


0xEB9E, 


0x9B79, 


0x8B58, 


0xBB3B, 


OxABIA, 


0x6CA6, 


0x7C87, 


0x4CE4, 


0x5CC5, 


0x2 C2 2, 


0x3C03 , 


0x0C60, 


OxlC41, 


OxEDAE , 


0xFD8F, 


OxCDEC, 


OxDDCD, 


0xAD2A, 


OxBDOB, 


0x8D68, 


0x9D49, 


0X7E97, 


0x6EB6, 


0X5ED5, 


0x4EF4, 


0X3E13, 


0X2E32, 


OxlESl, 


OxOE70, 


0xFF9F, 


OxEFBE , 


OxDFDD , 


OxCFFC, 


OxBFIB, 


0xAF3A, 


0x9F59, 


0x8F78, 


0x9188, 


0x81A9, 


OxBICA, 


OxAlEB, 


OxDIOC, 


0xC12D, 


0XF14E, 


0XE16F, 


0x1080 , 


0x0 0A1, 


0x30C2 , 


0x20E3 , 


0X5004 , 


0x4025, 


0x7046, 


0x6067 , 


0x83B9 , 


0x9398 , 


0xA3FB, 


0xB3DA, 


0xC3 3D, 


0xD31C, 


0xE37F, 


0xF3 5E, 


0x02Bl, 


0x1290, 


0x22F3 , 


0x32D2, 


0x4235, 


0x5214, 


0x6277, 


0x7256, 


0xB5EA, 


0xA5CB, 


0x9 5A8, 


0x8589, 


0xF56E, 


0xE54F, 


0xD52C, 


OxCSOD, 


0X34E2, 


0x24C3, 


0xl4A0 , 


0x0481, 


0x7466, 


0x6447, 


0x5424, 


0x4405, 


0xA7DB, 


0xB7FA, 


0x8799, 


0x97B8, 


0xE75F, 


0xF77E, 


0xC71D, 


0xD73C, 


0x26D3, 


0x36F2, 


0x0691, 


0xl6B0, 


0x6657, 


0x7676, 


0x4615, 


0x5634, 


0xD94C, 


0xC96D, 


0xF90E, 


0xE92F, 


0x99C8, 


0x89E9, 


0xB98A, 


0xA9AB, 


0x5844, 


0x4865, 


0x7806, 


0x6827,, 


0xl8C0, 


0X08E1, 


0x3882, 


0x2 8A3, 


0xCB7D, 


OxDBSC, 


0xEB3F, 


OxFBlE, 


0x8BF9, 


0x9BD8, 


OxABBB, 


0xBB9A, 


0x4A75, 


0x5A54 , 


0X6A3 7, 


0x7A16, 


OxOAFl, 


OxlADO, 


0x2AB3 , 


0X3A92, 


0xFD2E, 


OxEDOF, 


0xDD6C, 


0xCD4D, 


OxBDAA, 


0xAD8B, 


0x9DE8, 


0x8DC9, 


0x7C26, 


0x6C07, 


0x5C64, 


0x4C45, 


0x3 CA2, 


0x2C83, 


OxlCEO, 


OxOCCl, 


OxEFlF, 


0xFF3E, 


OxCFSD, 


0xDF7C, 


0xAF9B, 


OxBFBA, 


0x8FD9, 


0x9FF8 , 


0X6E17, 


0x7E36, 


0X4E55, 


0X5E74, 


0x2E93, 


0x3 EB2, 


OxOEDl, 


OxlEFO 



}; 



// 

// table for CRC-16 polynomial, eg X**16 + X**15 + X**2 + 1 (0x8005) 
// 

static const ushort crcl6tbl [256] = 
{ 

0x0000, 0x8005, 0x800F, OxOOOA, 0x801B, OxQOlE, 0x0014, 0x8011, 

0x8033, 0x0036, 0x003C, 0x8039, 0x0028, 0x802D, 0x8027, 0x0022, 

0x8063, 0x0066, 0x006C, 0x8069, 0x0078, 0x807D, 0x8077, 0x0072, 

0x0050, 0x8055, 0x805F, 0x005A, 0x804B, Ox004E, 0x0044, 0x8041, 

0x80C3, 0x00C6, OxOOCC, 0x80C9, 0x00D8, 0x80DD, 0x80D7, 0x00D2, 

OxOOFO, 0x80F5, 0x80FF, OxOOFA, 0x80EB, OxOOEE, 0x00E4, 0x80El, 

OxOOAO, 0x80A5, 0x80AF, OxOOAA, 0x80BB, OxOOBE, 0xO0B4, 0x80Bl f 

0x8093, 0x0096, 0x009C, 0x8099, 0x0088, 0x808D, 0x8087, 0x0082, 

0x8183, 0x0186, 0x018C, 0x8189, 0x0198, 0x819D, 0x8197, 0x0192, 

OxOlBO, 0x81B5, 0x81BF, OxOlBA, 0x81AB, OxOlAE, 0x01A4 , 0x81Al, 
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OxOlEO, 


0x81E5 


0x81D3 , 


0x0 1D6 


0x0140, 


0x8145 


0x8173, 


0x0176 


0x8123, 


0x0126 


0x0110, 


0x8115 


0x8303, 


0x0306 


0x0330, 


0x8335 


0x0360, 


0x8365 


0x8353, 


0x0356 


Ox03CO, 


0x83C5 


0x83F3, 


0x03F6 


0x8 3 A3, 


0x03A6 


0x0390, 


0x8395 


0x0280, 


0x8285 


0x82B3, 


0x02B6 


0x82E3, 


0x02E6 


0x02D0, 


0x82D5 


0x8243, 


0x0246 


0x0270, 


0x8275 


0x0220, 


0x8225 


0x8213, 


0x0216 



0x81EF f 


0x0 1EA, 


OxOlDC, 


0x81D9, 


0x814F, 


0X014A, 


Ox017C, 


0x8179, 


0x012C, 


0x8129, 


0x811F, 


OxOllA, 


Ox030C, 


0x8309, 


0x833F, 


0x033A, 


0x836F, 


0x036A, 


0x035C, 


0x8359, 


Ox83CF, 


0x03CA, 


0x0 3 FC, 


0x83F9, 


Ox03AC, 


0x83A9, 


0X839F, 


0x039A, 


0x828F, 


0x028A, 


0x02BC, 


0x82B9, 


0x02EC, 


0x82E9, 


0x82DF, 


0x02DA, 


0x024C, 


0x8249, 


0x827F, 


0x027A, 


0x822F, 


0x022A, 


Ox021C, 


0x8219, 



0x8 1FB, 


0x0 1FE, 


0x01C8, 


0x81CD, 


0x815B, 


OxOlSE, 


0x0168, 


0x816D, 


0x0138, 


0x813D, 


0x810B, 


OxOlOE, 


0x0318, 


0x831D, 


0x832B, 


0x032E, 


0x837B, 


0x037E, 


0x0348, 


0x834D, 


0x83DB, 


0x03DE, 


0x03E8, 


0x83ED, 


0x03B8, 


0x83BD, 


0x838B, 


0x038E, 


0x829B, 


0x029E, 


0x02A8, 


0x82AD, 


0x02F8, 


0x82FD, 


0x82CB, 


0x02CE, 


0x0258, 


0x825D, 


0x826B, 


0x026E, 


0x823B, 


0x023E, 


0x0208, 


0x820D, 



0x01F4, 


0x81Fl, 


0x81C7, 


0x01C2, 


0x0154, 


0x8151, 


0x8167, 


0x0162, 


0x8137, 


0x0132, 


0x0104, 


0x8101, 


0x8317, 


0x0312, 


0x0324, 


0x8321, 


0x0374, 


0x8371, 


0x8347, 


0x0342, 


0x03D4, 


0x83Dl, 


0x83E7, 


0x03E2, 


0x83B7, 


0x03B2, 


0x0384, 


0x8381, 


0x0294, 


0x8291, 


0x82A7, 


0x0 2 A2, 


0x82F7, 


0x02F2, 


Ox02C4, 


0x82Cl, 


0x8257, 


0x0252, 


0x0264, 


0x8261, 


0x0234, 


0x8231, 


0x8207, 


0x0202 



// 

// Name: crcstr 

// 

// Description: computes the CRC-CCITT checksum for a single string. 
// 

void 

crcstr ( register ushort *accum, register unsigned char *str ) 

{ 

*accum = 0; 
while ( *str ) 

*accum = (*accum << 8) " crctbl [ ( *accum >> 8) ~ (ushort) *str++3 ; 

} 

// 

// Name: crcblk 

// 

// Description: computes and accumulates the CRC-CCITT checksum 
// for a block of data. Caller must initialize the 

// "*accum" to zero at the start of data. 

// 

void 

crcblk { register ushort *accum, register unsigned char *buf , 
register int bytes ) 

{ 

while ( bytes-- ) 

*accum = (*accum « 8) ~ crctbl [ (*accum » 8) * (ushort) *buf++] ; 

} 

// 

// Name: crcstr 
// 

// Description: computes the CRC-16 checksum for a single string. 

// 
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void 

crclSstr { register ushort *accum, register unsigned char *str ) 

*accum = 0; 
while ( *str ) 

^ *accum = (*accum << 8) ~ crclGtbl [ ( *accum >> 8) ~ (ushort) *str++] ; 

// 

// Name: crclSblk 
// 

// Description: computes and accumulates the CRC-16 checksum 
// for a block of data. Caller must initialize the 

// u *accum n to zero at the start of data. 

// 

void 

crc!6blk ( register ushort *accum, register unsigned char *buf , 
register int bytes ) 

{ 

while ( bytes-- ) 

*accum = (*accum « 8) A crcl6tbl [ ( *accum » 8) " (ushort) *buf++] ; 



